LFMM analyses

Candidate SNP identification & Genomic offset predictions

Author

Juliette Archambeau

Published

October 3, 2023

1 Introduction

Document based on Gain and François (2021) and the LEA tutorial provided during the summer school Software and Statistical Methods for Population Genetics (SSMPG 2022; Aussois, September 19-23 2022).

The latent factor mixed model (LFMM) is a multivariate mixed regression model that estimates simultaneously the effects of environmental variables (fixed effects) and unobserved confounders called latent factors (Frichot et al. 2013; Caye et al. 2019). The latent factors are computed both from the genomes and from their environment. They are not representing neutral population structure (i.e. they have less direct interpretations than in ancestry estimation methods). Instead, they can be interpreted as the best estimates of the confounding effects of neutral population structure, leading to environmental effect size estimates with minimal bias.

2 Population structure

COMMENT: this section is not used in the paper (=> can be skipped!). Indeed, the analyses in this section are only useful if we want to estimate the population structure and impute missing data with the LEA package. In our study, we use the population structure estimates (the ancestry coefficients) from Jaramillo-Correa et al. (2015), and we impute the missing data based on the most common allele in the gene pool. And in the RDA, we do not use the ancestry coefficients to account for population structure; we use PC scores.

From the LEA package manual: The function snmf of the lfmm2 package estimates admixture coefficients using sparse Non-Negative Matrix Factorization algorithms, and provides STRUCTURE-like outputs. The input file has to be in the geno format, with one row for each SNP. Each row contains 1 character for each individual: 0 means zero copy of the reference allele. 1 means one copy of the reference allele. 2 means two copies of the reference allele. 9 means missing data.

Code
geno <-  read.csv(here("data/DryadRepo/FormattedFilteredGenomicData_AlleleCounts_withmaf.csv")) %>% 
  column_to_rownames("snp_ID") %>% 
  dplyr::mutate(across(everything(), ~replace_na(.x, 9))) %>% 
  t() %>% 
  as.data.frame()

geno %>% write.geno(here("data/LEAanalysis/AlleleCounts_GenoFormat.geno"))

kable_mydf(geno[1:10,1:8], boldfirstcolumn = F)

From Gain and François (2021):

Assuming \(n\) diploid organisms genotyped at \(L\) loci, the snmf algorithm decomposes the \(n × L\) matrix of observed allele frequencies, \(P\), in a product of two probabilistic matrices:

\[P \approx QF\]

where the coefficients of \(P\) take their values in {0,1/2,1} for diploid organisms (note: ploidy can be modified in snmf). The matrix \(Q\) is similar to the \(Q\) matrix of STRUCTURE, representing ancestry coefficients for individuals originating from \(K\) source populations (Pritchard, Stephens, and Donnelly 2000), and is a \(n × K\) matrix. The matrix \(F\) contains allele frequencies at each locus for each source population and is a \(K × 3L\) matrix. While \(P\) may contain some missing values, the product matrix, \(QF\), is always a complete probabilistic matrix. \(P\) is a \(n × 3L\) matrix.

Comment: The matrix \(F\) is called the matrix \(G\) in the LEA package and in Frichot et al. (2013).

Code
proj_snmf <- snmf(here("data/LEAanalysis/AlleleCounts_GenoFormat.geno"), 
                  K=1:10, 
                  repetitions = 10, # nb of repetitions for each K
                  entropy = TRUE,
                  project="new")
Code
proj_snmf <- load.snmfProject(here("data/LEAanalysis/AlleleCounts_GenoFormat.snmfProject"))

We plot the cross-entropy criterion of all run of the project.

Code
# We save the figure for the SI
pdf(width = 5, height = 4, here("figs/LFMM/CrossEntropyPlot.pdf"))
par(mar =c(4.5,4,1,1))
plot(proj_snmf, cex = 1.2, col = "lightblue", pch = 19)
dev.off()
png 
  2 
Code
plot(proj_snmf, cex = 1.2, col = "lightblue", pch = 19)

We get the cross-entropy of the 10 runs for K = 6 and we select the run with the lowest cross entropy

Code
ce <- cross.entropy(proj_snmf, K = 6)

best <- which.min(ce)

We display the Q-matrix.

Code
gp_colors <- c("orangered3","gold2","darkorchid3","navyblue","turquoise2","green3") # define the colors of the gene pools

bp <- barchart(proj_snmf, K = 6,  run= best,
               border = NA, space = 0, col = gp_colors, 
               xlab = "Individuals", ylab = "Ancestry proportions", 
               main = "Ancestry matrix")
        
axis(1, at = 1:length(bp$order), labels = bp$order, las = 3, cex.axis = .4)

We use the G function to extract the ancestral genotype frequency matrix, \(G\), for the 2nd run for K = 6.

Code
Gmatrix <- G(proj_snmf, K = 6, run = best)

Gmatrix[1:5,1:6] %>% 
  as.data.frame() %>% 
  kable_mydf(boldfirstcolumn = F)
V1 V2 V3 V4 V5 V6
1 1 1.0 0.44 1.00 1.00
0 0 0.0 0.48 0.00 0.00
0 0 0.0 0.08 0.00 0.00
1 1 0.7 0.00 0.68 0.34
0 0 0.3 0.38 0.27 0.47

3 Climatic data

We load the population-specific climatic information for the climatic variables of interest. To run lfmm2, individuals (genotypes in our case) have to be in rows and climatic variables in columns.

The past and future climatic data have been scaled with the parameters (mean and variance) of the past climatic data (which is done by the function generate_clim_datatsets).

Code
# Selected climatic variables
# ===========================
clim_var <- readRDS(here("data/ClimaticData/NamesSelectedVariables.rds"))


# Past and future climatic data
# =============================
source(here("scripts/functions/generate_scaled_clim_datasets.R"))
clim_dfs <- generate_scaled_clim_datasets(clim_var)

We attribute climatic values to each genotype, i.e. genotypes from the same populations will have the same climatic values.

Code
genotypes <- geno %>% 
  rownames_to_column("clon") %>% 
  dplyr::select(clon) %>% 
  as_tibble()

clim_ref <- genotypes %>% 
  mutate(pop = str_sub(clon,1,3)) %>% 
  left_join(clim_dfs$clim_ref, by="pop") %>% 
  dplyr::select(-pop,-clon)

4 Estimating gene-environment associations

4.1 Input data

4.1.1 Genomic data

We load the genomic data. The genomic has to be allele counts without missing data, with individuals (genotypes) in rows and SNPs in columns.

Code
# we load the imputed genomic data with allele counts (and without MAF)
geno <-  read.csv(here("data/DryadRepo/ImputedGenomicData_AlleleCounts_withoutmaf.csv")) %>% 
  column_to_rownames("snp_ID") %>% 
  t() %>% 
  as_tibble()

4.2 Run lfmm2

From Caye et al. (2019): LFMMs are regression models combining fixed and latent effects as follows:

\[ \mathbf{Y} = \mathbf{XB}^T + \mathbf{W} + \mathbf{E}\]

\(\mathbf{Y}\) is the response matrix, which records data for \(n\) individuals genotyped for \(p\) genetic markers. \(\mathbf{X}\) is the matrix of the environmental or primary variables. Nuisance variables such as observed confounders can be included in the \(\mathbf{X}\) matrix, which dimension is then \(n \times d\), where \(d\) represents the total number of primary and nuisance variables. The fixed effect sizes are recorded in the \(\mathbf{B}\) matrix, which has dimension \(p \times d\). The \(\mathbf{E}\) matrix represents residual errors, and it has the same dimensions as the response matrix. The matrix \(\mathbf{W}\) is a “latent matrix” of rank \(K\),defined by \(K\) latent factors. The \(K\) latent factors represent unobserved confounders which are modeled through an \(n \times K\) matrix, \(\mathbf{U}\).The matrix \(\mathbf{U}\) is obtained from a singular value decomposition (SVD) of the matrix \(\mathbf{W}\) as follows

\[\mathbf{W} = \mathbf{UV}^T \]

where \(\mathbf{V}\) is a \(p \times K\) matrix of loadings. The \(\mathbf{U}\) and \(\mathbf{V}\) matrices are unique up to arbitrary signs for the factors and loadings.

As there are 6 gene pools in maritime pine (all represented in our population sample), we run the LFMM model with K=6.

Code
mod_lfmm2 <- lfmm2(input = geno,
                   env = clim_ref, 
                   K = 6)

The function lfmm2 returns an object of class lfmm2Class that contains the \(\mathbf{U}\) and \(\mathbf{V}\) matrices.

4.3 Calibration issues

With the function lfmm2.test, we can get a vector of p-values for association between loci and climatic variables adjusted for latent factors computed by lfmm2.

The full option:

  • If FALSE, the function lfmm2.test computes significance values (p-values) from standard Student tests for each climatic variable.

  • If TRUE, the function lfmm2.test returns p-values for the full set of climatic variables (a single value at each locus) using Fisher tests.

The genomic.control option:

  • If TRUE (default option), the p-values are recalibrated by using genomic control after correction for confounding.

  • If FALSE, the p-values are not recalibrated.

We can check if the p-values are well calibrated or not with the histograms of the p-values: ideally, they should be flat with a peak close to zero. in the two graphs below, we show the distribution of the non-calibrated (left graph) and calibrated (right graph) p-values. We can see that it is important to set the genomic.control to its default value TRUE if we want the p-values to be well calibrated.

Code
par(mfrow=c(1,2))

# Histogram of non-calibrated p-values
# ------------------------------------
lfmm2.test(object = mod_lfmm2, input = geno, env = clim_ref, full = TRUE, genomic.control = FALSE)$pvalues %>% 
hist(col = "orange", 
     main="Histogram of non-calibrated p-values",
     xlab="p-values")

# Histogram of calibrated p-values
# --------------------------------
lfmm2.test(object = mod_lfmm2, input = geno, env = clim_ref, full = TRUE, genomic.control = TRUE)$pvalues %>% 
hist(col = "orange", 
     main="Histogram of calibrated p-values",
     xlab="p-values")

In the following analyses, we use the default genomic.control=TRUE and full=TRUE, so all climatic variables are used in the test.

Code
test_lfmm2 <- lfmm2.test(object = mod_lfmm2,
                       input = geno,
                       env = clim_ref, 
                       full = TRUE,
                       genomic.control = TRUE)

pv_lfmm2 <- test_lfmm2$pvalues

plot(-log10(pv_lfmm2 ), 
     cex = .3, 
     col = "blue",
     xlab = "Locus",  
     ylab = "-log10(P)", 
     main="Manhattan plot of log10 p-values")

4.4 Multiple testing and calibration issues

We want to use the FDR control algorithm to correct for multiple testing and determine which loci have significant levels of association. The False Discovery Rate (FDR) is defined as:

FDR = prob(False Discovery | Positive test) = q.

The FDR algorithm requires that the tests are correctly calibrated, i.e. that the distribution of p-values is uniform when we assume that the null hypothesis, \(H0\), is correct. That’s ok in our case, we have already checked it above with the histogram of p-values.

Which FDR level do we use?

Code
fdr_level <- 0.05

To identify the candidate SNPs, we apply the chosen FDR control to the p-values, which converts them into q-values. And then we identify candidates as those with q-values below a given FDR threshold.

The candidate SNPs at the FDR level of 5% are shown with circles on the Manhattan plot below. The orange line corresponds to the Bonferroni threshold for a type I error of 10%.

Code
# applying FDR control method to obtain q-values
qv_lfmm2  <- qvalue::qvalue(pv_lfmm2, fdr.level = fdr_level)

# Manhattan plot
plot(-log10(pv_lfmm2 ), 
     cex = .3, 
     col = "blue",
     xlab = "Locus",  
     ylab = "-log10(P)", 
     main="Manhattan plot of log10 p-values")

# Show with an orange line the Bonferonni multiple testing threshold for significance
abline(h = -log10(0.1/ncol(geno)), col = "orange")

# Extract the list of candidates
candidates <- which(qv_lfmm2$significant)

# Show with circles the list of candidate loci at the chosen FDR level
points(candidates, -log10(pv_lfmm2)[candidates], cex = .9, col = "brown")

Code
# We save the Manhattan plot for the Supplementary Information
# ============================================================

pdf(width = 8, height = 5.5, here("figs/LFMM/SI_ManhattanPlot.pdf"))
par(mfrow=c(1,1))
plot(-log10(pv_lfmm2 ), 
     cex = .3, 
     col = "gray",
     xlab = "Locus",  
     ylab = "-log10(p-values)", 
     main="Manhattan plot")
abline(h = -log10(0.1/ncol(geno)), col = "orange")
points(candidates, -log10(pv_lfmm2)[candidates], cex = 1, col = "brown")
dev.off()
Code
# We save the list of candidate SNPs for further analyses
# =======================================================
candidates <- names(geno)[candidates]

saveRDS(candidates, here("outputs/LFMM/candidates.rds"))

We obtain 120 candidate SNPs.

5 Genomic offset predictions

This part is entirely based on the LEA tutorial provided during SSMPG 2022.

Until now, methods to estimate the genomic offset has defined it as a distance in the genetic space (i.e. distance between allele frequencies). In the R packae LEA, the genomic offset is alternatively defined as a distance in the environmental space, i.e. measures of genomic offset are linked to the geometry of the ecological niche. This new definition of the genomic offset is referred as genetic gap and is implemented in the genetic.gap function of the LEA package.

The genetic gap is based on the estimates of environmental effect sizes obtained from an LFMM. The relationship inferred in the GEA is then used to fit and predict allelic variation at all genomic loci, alleviating the need for a set of candidate loci and the choice of a significance level.

Below, we will estimate the genetic gap for:

  • the sets of candidate SNPs

  • the control set of SNPs

  • all SNPs (as advised by the developers of the genetic.gap function)

5.1 Loading future climatic data

We load the predicted values of the climatic variables for the years 2041-2070 that have been scaled (i.e. mean-center) with the same scaling parameters as the one used to mean-center the annual climatic values across the period 1901-1950.

Code
# Attribute climatic values for each genotype
list_clim_pred <- lapply(clim_dfs[[2]], function(clim_pred){

genotypes %>% 
  mutate(pop = str_sub(clon,1,3)) %>% 
  left_join(clim_pred, by="pop") %>% 
  dplyr::select(-pop,-clon,-gcm)
  
})

5.2 GO predictions using all SNPs

We run the genetic.gap function.

Code
ggap_all_snps <- lapply(list_clim_pred, function(clim_pred){

ggap <- genetic.gap(input = geno,
                    env = clim_ref,
                    pred.env = clim_pred,
                    K = 6)

# we keep one value per population
ggap[[1]] <- unique(ggap[[1]])
ggap[[2]] <- unique(ggap[[2]])

names(ggap)[[1]] <- "go"

return(ggap)

})

The outputs of the genetic.gap function are (copy-and-pasted from the LEA manual):

  • offset: a vector of genomic offset values computed for every sample location in new.env and pred.env. Note that the genomic offset is referred to as the genetic gap in Gain and François (2021) and the geometric GO in Gain et al. (2023).

  • distance: a vector of environmental distance values computed for every sample location in new.env and pred.env. The distances to an estimate of the risk of nonadaptedness that includes correction for confounding factors and analyzes multiple predictors simultaneously (modified version of RONA).

  • eigenvalues: eigenvalues of the covariance matrix of LFMM effect sizes. They represent the relative importance of combinations of environmental variables described in vectors when the environmental data have similar scales. To be used with scale == TRUE.

  • vectors: eigenvectors of the covariance matrix of LFMM effect sizes representing combinations of environmental variables sorted by importance (eigenvalues).

5.2.0.1 Relationship with Euclidean climatic distance

In the supplementary information of Gain et al. (2023), the authors say that ‘If (and only if) the eigenvalues of the covariance matrix are equal, the geometric GO is proportional to the squared Euclidean distance between environmental predictors.’ Note that a variable \(y\) is proportional to a variable \(x\) if there is a non-zero constant \(k\) such as \(y = kx\).

In this part, we look at the relationship between the squared root of the genetic gap vs the Euclidean climatic distance (or the genetic gap and the squared Euclidean climatic distance).

The Euclidean climatic distance is calculated as follows:

\[ D_{clim} = \sqrt{\sum_{i=1}^{N} (X_{past,i} - X_{fut,i})^2} \]

with \(N\) the number of selected climatic variables, \(X_{past,i}\) the mean value of the climatic variable \(i\) across the period 1901-1950 and \(X_{fut,i}\) the predicted mean value of the climatic variable \(i\) across the period 2041-2070.

Code
# Calculate the Euclidean climatic distance
list_dist_env <- lapply(list_clim_pred, function(clim_pred){
  
Delta = clim_ref - clim_pred  

dist_env = unique(sqrt(rowSums(Delta^2)))

})
Code
# Incorporating gene pool information
# -----------------------------------

# In the graphs below, we want to color the populations according to the main gene pool they belong to

# we load the main gene pool information for each clone
gps <- readRDS(here("data/GenomicData/MainGenePoolInformation.rds"))[[1]] %>% arrange(pop)

Correlation between the squared Euclidean climatic distance and the genetic gap (offset in the outputs):

Code
lapply(names(ggap_all_snps), function(gcm){

cor(ggap_all_snps[[gcm]]$go,list_dist_env[[gcm]]^2)
    
}) %>% 
  setNames(names(ggap_all_snps)) %>%  
  bind_rows(.id="GCM") %>% 
  kable_mydf()
GFDL-ESM4 IPSL-CM6A-LR MPI-ESM1-2-HR MRI-ESM2-0 UKESM1-0-LL
0.84 0.87 0.91 0.84 0.83

Comment: The squared Euclidean climatic distance and the genetic gap are somewhat correlated, so it means that the genetic gap \(GO\) can be expressed as follows: \(GO = a \times dist^2 + b + e\) with \(dist\) being the Euclidean climatic distance and the \(e\) the residuals (ie the noise). Note that \(GO\) and \(dist^2\) are correlated but not proportional to!

We can plot the eigenvalues with a barplot to evaluate the relative importance of the climatic variables.

Code
# the relative importance is the same for all genomic offset predictions at it is only influenced by past climates, and not future climates
barplot(ggap_all_snps[[1]]$eigenvalues, col = "orange", xlab = "Axes", ylab = "Eigenvalues") 

We see that only two dimensions of the climatic space influence the genetic gap. We can also see it with the loadings for the first two combinations of variables (vectors in the outputs of the genetic.gap function), which indicate the relative contribution of the variables to local adaptation. We see that the first two variables had increased importance compared to the last ones (the variables are sorted by importance in the vectors outputs, so we do not know which variables they are).

Code
# Function to build the plots 
source(here("scripts/functions/make_eucli_plot.R"))
Code
par(mfrow=c(1,2))

# plot the squared root of the genetic gap vs Euclidean environmental distance
lapply(names(ggap_all_snps), function(gcm){

  make_eucli_plot(X = list_dist_env[[gcm]],
          Y = sqrt(ggap_all_snps[[gcm]]$go),
          colors = gps$color_main_gp_pop, 
          color_names = gps$main_gp_pop,
          ylab = "sqrt(genetic gap)",
          plot_title = paste0("All SNPs - ",gcm))  
})

Code
# We generate scatter plots for the Supplementary Information.
# ============================================================

# Axis limits
# ===========
max_go <- lapply(names(list_dist_env), function(gcm) ggap_all_snps[[gcm]]$go) %>% 
  unlist() %>% max()
max_go <- max_go + 0.002
range_eucli <- list_dist_env %>% unlist() %>% range()

# Run the function
# ================
p <- lapply(names(list_dist_env), function(gcm){
  
make_ggscatterplot(
  x = list_dist_env[[gcm]],
  y = ggap_all_snps[[gcm]]$go,
  title=gcm,
  range_eucli = range_eucli,
  max_go = max_go)

})

# remove y-labels to graphs in the second column
p[[2]] <- p[[2]] + ylab("")
p[[4]] <- p[[4]] + ylab("")


# remove x-labels to graphs in the second and third rows
p[[1]] <- p[[1]] + xlab("")
p[[2]] <- p[[2]] + xlab("")
p[[3]] <- p[[3]] + xlab("")

p[[6]] <- get_legend(p[[1]])

for(i in 1:5){p[[i]] <- p[[i]]  +  theme(legend.position = "none")} 


plot_grid(plotlist=p, nrow = 3) %>% 
  ggsave(here(paste0("figs/LFMM/ScatterPlotEucliDistance_AllSNPs.pdf")), 
         .,
         width=7,
         height=8,
         device="pdf")

5.2.0.2 Maps

Code
# Function to make the genomic offset maps
source(here("scripts/functions/make_go_map.R"))

# Population coordinates
pop_coord <-  readRDS(here(paste0("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_noADJ.rds")))[[1]]$ref_means %>% dplyr::select(pop,longitude,latitude)

# Find minimum and maximum values of genomic offset for the maps
go_limits <- lapply(names(ggap_all_snps), function(gcm){
  ggap_all_snps[[gcm]]$go
}) %>%  unlist() %>% range()
# The minimum GO value is very very small, almost zero, so we fix it to zero.
go_limits[[1]] <- 0

list_go_all_snps <- list(go = lapply(names(ggap_all_snps), function(gcm){ggap_all_snps[[gcm]]$go}) %>% setNames(names(ggap_all_snps)))

# Generate the maps for each set of SNPs and each GCM
go_maps <- lapply(names(ggap_all_snps), function(gcm){
  
make_go_map(dfcoord=pop_coord,
            snp_set = list_go_all_snps,
            gcm=gcm,
            ggtitle=gcm,
            go_limits = go_limits,
            point_size = 3)

})

legend_maps  <- get_legend(go_maps[[1]])

go_maps <- lapply(go_maps, function(y) y + theme(legend.position = "none"))

go_maps$legend_maps <- legend_maps

go_maps <-plot_grid(plotlist=go_maps)

# =====================================================
# We save the figures for the Supplementary Information
# =====================================================
ggsave(here("figs/LFMM/GOMaps_PopLocations_AllSNPs_SI.pdf"), go_maps, width=10,height=6, device="pdf")


# =========
# Add title
# =========
title <- ggdraw() + 
  draw_label(
    "All SNPs",
    fontface = 'bold',
    x = 0,
    hjust = 0
  ) +
  theme(plot.margin = margin(0, 0, 0, 7))

# merge title and plots
plot_grid(
  title, go_maps,
  ncol = 1,
  rel_heights = c(0.1, 1))

5.2.0.3 Validation - NFI plots

We load the climatic data of the NFI plots.

Code
# Load the climatic data of the NFI plots.
nfi_clim <- readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))

# Keep only the climatic variables of interest and scale the climatic data
nfi_dfs <- generate_scaled_clim_datasets(clim_var, clim_ref = nfi_clim$clim_ref, clim_pred = nfi_clim$clim_survey)

We predict the genomic offset at the location of the NFI plots.

Code
go_nfi <- genetic.gap(input = geno,
                      env = clim_ref,
                      new.env = nfi_dfs$clim_ref[,-1],
                      pred.env = nfi_dfs$clim_pred[,-1],
                      K = 6)

# Find minimum and maximum values of genomic offset for the maps
go_limits <- go_nfi$offset %>%  range()
# The minimum GO value is very very small, almost zero, so we fix it to zero.
go_limits[[1]] <- 0


# map genomic offset predictions in the NFI plots 
go_allsnps <- list(go_nfi = go_nfi$offset,
                   set_name = "All SNPs")

p <- make_go_map(
  dfcoord= readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))[[1]] %>% dplyr::select(contains("ude")), 
  snp_set = go_allsnps,
  type="NFI",
  point_size = 0.5,
  go_limits = go_limits,
  legend_position = c(0.85,0.15),
  legend_box_background = "gray80",
  y_limits = c(35, 51))

# We save the figure for the Supplementary Information:
p_SI <- p + theme(plot.title = element_blank())
ggsave(here("figs/LFMM/NFI_GOmap_AllSNPs_SI.pdf"), p_SI, width=8,height=8, device="pdf")

p

5.2.0.4 Validation - Common gardens

Code
# Common garden information
# =========================
cg_clim <- readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds")) %>%  dplyr::select(cg,any_of(clim_var))
cg_coord <- readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds")) %>% dplyr::select(cg,contains("ude"))
cg_names <- unique(cg_coord$cg)


# Climatic datasets for the predictions
# =====================================

# In these datasets, one row per prediction
# We want GO predictions for each combination pop * CG, so 34 * 5 = 160 rows

# climatic data of the populations
# each row (ie each population climatic data) is repeated 5 times
clim_pop <- generate_scaled_clim_datasets(clim_var, clim_pred = cg_clim)[[1]] %>% 
  slice(rep(1:n(), each = nrow(cg_clim)))

# Climatic data in the common gardens scaled with the mean and variance of the ref-period climatic data
# each climatic dataset is repeated 34 times and then rows are combined
clim_cg <- generate_scaled_clim_datasets(clim_var, clim_pred = cg_clim)[[2]] %>% 
  replicate(nrow(pop_coord),.,simplify=F) %>% 
  bind_rows()
Code
# Genomic offset predictions in the common gardens
# ================================================
go_allsnps$go_cg <- bind_cols(pop=clim_pop[,1],cg=clim_cg[,1]) %>% 
  mutate(go=genetic.gap(input = geno,
                     env = clim_ref,
                     new.env = clim_pop[,-1],
                     pred.env = clim_cg[,-1],
                     K = 6)$offset) %>% 
  pivot_wider(names_from = cg, values_from = go)



# Mapping
# ========
go_maps_cg <- lapply(cg_names, function(cg_name){
  
# Find minimum and maximum values of genomic offset for the maps
go_limits <-  go_allsnps$go_cg[[cg_name]] %>%  range()
go_limits[[1]] <- 0
  
p <- make_go_map(dfcoord=pop_coord,
                    snp_set = go_allsnps,
                    point_size = 3,
                    type="CG",
                    go_limits = go_limits,
                    cg_name=cg_name,
                    cg_coord=cg_coord)

# Saving the figures for the Supplementary Information
p_SI <- p + theme(plot.title = element_blank())
ggsave(filename = here(paste0("figs/LFMM/GOmap_allSNPs_",cg_name,"_SI.pdf")), p_SI, width=7,height=5, device="pdf")

p
})

plot_grid(plotlist=go_maps_cg)

We save the genomic offset predictions for comparison with other methods.

Code
go_allsnps$go <- lapply(ggap_all_snps, function(x) x$go)

go_allsnps %>% saveRDS(file=here("outputs/LFMM/go_predictions_allsnps.rds"))

5.3 GO predictions using sets of SNPs

We estimate the genetic gap for three sets of SNPs:

  • two sets of candidates SNPs.

  • One set of randomly selected SNPs.

Code
snp_sets <- readRDS(here("outputs/list_snp_sets.rds"))
Code
snp_sets <- lapply(snp_sets, function(snp_set){
  
snp_set$go <- lapply(list_clim_pred, function(clim_pred){
  
ggap <- genetic.gap(input = geno,
                    env = clim_ref,
                    pred.env = clim_pred,
                    candidate.loci = which(names(geno) %in% snp_set$set_snps),
                    K = 6)

return(unique(ggap$offset))

}) 
return(snp_set)
})

5.3.1 Relatioship with Euclidean distance

Code
par(mfrow=c(1,2))

lapply(snp_sets, function(snp_set) {

lapply(names(list_clim_pred), function(gcm){
  
make_eucli_plot(
  X = list_dist_env[[gcm]],
  Y = sqrt(snp_set$go[[gcm]]),
  colors = gps$color_main_gp_pop,
  color_names = gps$main_gp_pop,
  ylab = "sqrt(genomic offset)",
  legend_position="topleft",
  plot_title = paste0(snp_set$set_name," - ", gcm))

})
}) 

Code
# We generate scatter plots for the Supplementary Information.
# ============================================================

range_eucli <- list_dist_env %>% unlist() %>% range()

# Run the function
# ================
lapply(snp_sets[c(1,3)], function(set_i) {

  max_go <- set_i$go %>% unlist() %>% max()
  max_go <- max_go + 0.001
  
p <- lapply(names(list_dist_env), function(gcm){
  
make_ggscatterplot(
  x = list_dist_env[[gcm]],
  y = set_i$go[[gcm]],
  title=gcm,
  range_eucli = range_eucli,
  max_go = max_go)

})

# remove y-labels to graphs in the second column
p[[2]] <- p[[2]] + ylab("")
p[[4]] <- p[[4]] + ylab("")


# remove x-labels to graphs in the second and third rows
p[[1]] <- p[[1]] + xlab("")
p[[2]] <- p[[2]] + xlab("")
p[[3]] <- p[[3]] + xlab("")

p[[6]] <- get_legend(p[[1]])

for(i in 1:5){p[[i]] <- p[[i]]  +  theme(legend.position = "none")} 


plot_grid(plotlist=p, nrow = 3) %>% 
  ggsave(here(paste0("figs/LFMM/ScatterPlotEucliDistance_",set_i$set_code,".pdf")), 
         .,
         width=7,
         height=8,
         device="pdf")

})
$cand
[1] "C:/Users/jularc/Documents/GOPredEvalPinpin/GOPredEvalPinpin/figs/LFMM/ScatterPlotEucliDistance_cand.pdf"

$control
[1] "C:/Users/jularc/Documents/GOPredEvalPinpin/GOPredEvalPinpin/figs/LFMM/ScatterPlotEucliDistance_control.pdf"

5.3.2 Eigenvalues

Let’s look the the barplots of the eigenvalues.

Comment: The eigenvalues are properties of the model fit, and so only depends on the genomic data and the climate across the reference period. In other words, they do not vary across the different GCMs (ie different future climatic data). That’s why we generate only three barplots, one for each set of SNPs.

Code
par(mfrow=c(1,3))

lapply(snp_sets, function(snp_set){
  
ggap <- genetic.gap(input = geno,
                    env = clim_ref,
                    pred.env = list_clim_pred[[1]], # can be any element of the list, it does not matter
                    candidate.loci = which(names(geno) %in% snp_set$set_snps),
                    K = 6)

barplot(ggap$eigenvalues, col = "orange", xlab = "Axes", ylab = "Eigenvalues",main=snp_set$set_name)
}) 

Correlations between the squared Euclidean climatic distance and the genetic gap:

Code
lapply(snp_sets, function(snp_set) {

lapply(names(list_clim_pred), function(gcm){
  
cor(sqrt(snp_set$go[[gcm]]),list_dist_env[[gcm]]^2)
    
  }) %>% 
    setNames(names(list_clim_pred)) %>% 
    bind_rows()
  
}) %>% 
  bind_rows(.id="SNP set") %>% 
  kable_mydf()
SNP set GFDL-ESM4 IPSL-CM6A-LR MPI-ESM1-2-HR MRI-ESM2-0 UKESM1-0-LL
cand 0.70 0.74 0.86 0.77 0.80
cand_corrected 0.63 0.69 0.82 0.73 0.79
control 0.74 0.83 0.90 0.79 0.82

5.3.3 Comparing GO predictions

We look at the correlation across the different genomic offset predictions at the location of the populations, i.e. those based on all SNPs and those based on sets of candidates or control SNPs.

Code
par(mfrow=c(1,3))

lapply(names(go_allsnps$go), function(gcm){
  
lapply(snp_sets, function(x) x$go[[gcm]]) %>% 
  as_tibble() %>%
  mutate(allsnps = go_allsnps$go[[gcm]]) %>% 
  cor() %>% 
  corrplot(method = 'number',type = 'lower', 
           diag = FALSE,mar=c(0,0,2,0),
           title=gcm,
           number.cex=2,tl.cex=1.5)
  
})

5.3.4 Maps

Code
# Find minimum and maximum values of genomic offset for the maps
go_limits <- lapply(snp_sets, function(x) {
lapply(names(list_clim_pred), function(gcm){
x$go[[gcm]]
}) %>%  unlist()
}) %>%  unlist() %>% range()
# The minimum GO value is very very small, almost zero, so we fix it to zero.
go_limits[[1]] <- 0


# Generate the maps for each set of SNPs and each GCM
lapply(snp_sets, function(x) {

go_maps <- lapply(names(list_clim_pred), function(gcm){
  
make_go_map(dfcoord=pop_coord,
            snp_set = x,
            gcm=gcm,
            ggtitle=gcm,
            go_limits = go_limits,
            point_size = 3)

})

legend_maps  <- get_legend(go_maps[[1]])

go_maps <- lapply(go_maps, function(y) y + theme(legend.position = "none"))

go_maps$legend_maps <- legend_maps

go_maps <-plot_grid(plotlist=go_maps)

# =====================================================
# We save the figures for the Supplementary Information
# =====================================================
if(x$set_code=="cand"){ 
  ggsave(here("figs/LFMM/GOMaps_PopLocations_CandidateSNPs_SI.pdf"), go_maps, width=10,height=6, device="pdf")
} else if(x$set_code=="control"){
    ggsave(here("figs/LFMM/GOMaps_PopLocations_ControlSNPs_SI.pdf"), go_maps, width=10,height=6, device="pdf")
}


# =========
# Add title
# =========
title <- ggdraw() + 
  draw_label(
    x$set_name,
    fontface = 'bold',
    x = 0,
    hjust = 0
  ) +
  theme(plot.margin = margin(0, 0, 0, 7))

# merge title and plots
plot_grid(
  title, go_maps,
  ncol = 1,
  rel_heights = c(0.1, 1))

  })

For each GCM, we attribute the value 1 to the top five populations with the highest genomic offset and we attribute the value 0 to the other populations. We then count the number of 1 for each population, which gives the table and map below:

Code
source(here("scripts/functions/make_high_go_pop_maps.R"))

high_go_pops <- make_high_go_pop_maps(pop_coord=pop_coord,
                                      list_go = snp_sets$cand$go,
                                      ggtitle="LFMM",
                                      nb_id_pop = 5) # number of selected populations

saveRDS(high_go_pops, file = here("outputs/LFMM/high_go_pops.rds"))

high_go_pops[[1]] %>% kable_mydf()
pop longitude latitude GFDL-ESM4 IPSL-CM6A-LR MPI-ESM1-2-HR MRI-ESM2-0 UKESM1-0-LL sum_go
PIE 9.04 41.97 0 0 0 0 0 0
PET -1.30 44.06 0 0 0 0 0 0
MIM -1.30 44.13 0 0 0 0 0 0
ARM -6.46 43.30 0 0 0 0 0 0
CAD -6.42 43.54 0 0 0 0 0 0
ALT -6.49 43.28 0 0 0 0 0 0
CAS -6.98 43.50 0 0 0 0 0 0
LAM -6.22 43.56 0 0 0 0 0 0
SIE -6.49 43.53 0 0 0 0 0 0
HOU -1.15 45.18 0 0 0 0 0 0
PLE -2.34 47.78 0 0 0 0 0 0
STJ -2.03 46.76 0 0 0 0 0 0
PUE -6.63 43.55 0 0 0 0 0 0
VER -1.09 45.55 0 0 0 0 0 0
SEG -8.45 42.82 0 0 0 0 0 0
OLO -1.83 46.57 0 0 0 0 0 0
BON -1.66 39.99 0 0 0 0 0 0
SAC -8.36 42.12 0 0 0 0 0 0
PIA 9.46 42.02 0 0 0 0 0 0
OLB -0.62 40.17 0 0 0 0 0 0
SAL -3.06 41.83 0 0 0 0 0 0
CEN -4.49 40.28 0 0 0 0 0 0
BAY -2.88 41.52 0 0 0 0 0 0
LEI -8.96 39.78 0 0 0 1 0 1
QUA -0.36 38.97 0 0 0 1 0 1
VAL -4.31 40.52 0 0 1 0 0 1
CAR -4.28 41.17 0 1 0 0 0 1
CUE -4.48 41.37 1 0 0 0 0 1
ARN -5.12 40.19 1 0 0 0 0 1
COC -4.50 41.25 0 1 0 0 1 2
TAM -5.02 33.60 0 1 1 0 1 3
ORI -2.35 37.53 1 0 1 1 1 4
MAD -5.23 35.18 1 1 1 1 1 5
COM -3.95 36.83 1 1 1 1 1 5
Code
high_go_pops[[2]]

5.3.5 Validation - NFI plots

Code
# Calculate the genomic offset for the NFI plots
snp_sets <- lapply(snp_sets, function(snp_set){
  
ggap <- genetic.gap(input = geno,
                    env = clim_ref,
                    new.env = nfi_dfs$clim_ref[,-1],
                    pred.env = nfi_dfs$clim_pred[,-1],
                    candidate.loci = which(names(geno) %in% snp_set$set_snps),
                    K = 6)

snp_set$go_nfi <- ggap$offset

return(snp_set)

})

# checking missing data
# lapply(snp_sets, function(x) sum(is.na(x$go_nfi)))

# Find minimum and maximum values of genomic offset for the maps
go_limits <- lapply(snp_sets, function(snp_set) snp_set$go_nfi) %>%  unlist() %>% range()
# The minimum GO value is very very small, almost zero, so we fix it to zero.
go_limits[[1]] <- 0


# map genomic offset predictions in the NFI plots 
lapply(snp_sets, function(x) {
  
p <- make_go_map(
  dfcoord= readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))[[1]] %>% dplyr::select(contains("ude")), 
  snp_set = x,
  type="NFI",
  point_size = 0.5,
  go_limits = go_limits,
  legend_position = c(0.85,0.15),
  legend_box_background = "gray80",
  y_limits = c(35, 51))
  
# Figure for the SI
# =================
if(x$set_code=="cand"){ 
  p_SI <- p + theme(plot.title = element_blank())
  ggsave(here("figs/LFMM/NFI_GOmap_CandidateSNPs_SI.pdf"), p_SI, width=8,height=8, device="pdf")
  
  } else if(x$set_code=="control"){
    p_SI <- p + theme(plot.title = element_blank())
    ggsave(here("figs/LFMM/NFI_GOmap_ControlSNPs_SI.pdf"), p_SI, width=8,height=8, device="pdf")
    }

# Show maps in the Quarto document
# ================================
p
  
  })

We look at the correlation across the different genomic offset predictions in the NFI plots, i.e. those based on all SNPs and those based on sets of candidates or control SNPs.

Code
lapply(snp_sets, function(x) x$go_nfi) %>% 
  as_tibble() %>%
  mutate(allsnps = go_allsnps$go_nfi) %>% 
  cor() %>% 
  corrplot(method = 'number',type = 'lower', diag = FALSE,mar=c(0,0,2,0),
               number.cex=2,tl.cex=1.5)

5.3.6 Validation - Common gardens

Code
# Predict genomic offset of each population when transplanted in the climate of the common gardens
snp_sets <- lapply(snp_sets, function(snp_set){

ggap <- genetic.gap(input = geno,
                    env = clim_ref,
                    new.env = clim_pop[,-1],
                    pred.env = clim_cg[,-1],
                    candidate.loci = which(names(geno) %in% snp_set$set_snps),
                    K = 6)

snp_set$go_cg <- bind_cols(pop=clim_pop[,1],cg=clim_cg[,1]) %>% 
  mutate(go=ggap$offset) %>% 
  pivot_wider(names_from = cg, values_from = go)

return(snp_set)
})



# Mapping
# ========
go_maps_cg <- lapply(cg_names, function(cg_name){
  
# Find minimum and maximum values of genomic offset for the maps
go_limits <- lapply(snp_sets, function(snp_set) snp_set$go_cg[[cg_name]]) %>%  unlist() %>% range()
go_limits[[1]] <- 0
  
p <- lapply(snp_sets, function(x) {
  
 p <- make_go_map(dfcoord=pop_coord,
              snp_set = x,
              point_size = 3,
              type="CG",
              go_limits = go_limits,
              cg_name=cg_name,
              cg_coord=cg_coord)
 
 
 # We save the figures for the Supplementary Information
 if(x$set_code=="cand") {
   p_SI <- p + theme(plot.title = element_blank(),
                     legend.position = "none")
   
   p_SI %>% 
   ggsave(filename = here(paste0("figs/LFMM/GOmap_",x$set_code,"_",cg_name,"_SI.pdf")), device = "pdf",width=5,height=5)}
  
  if(x$set_code=="control") {
   p_SI <- p + theme(plot.title = element_blank())
   
   p_SI %>% 
   ggsave(filename = here(paste0("figs/LFMM/GOmap_",x$set_code,"_",cg_name,"_SI.pdf")), device = "pdf",width=7,height=5)}
 
 
 
 p
 
  })

plot_grid(plotlist=p,nrow=1)
  
}) %>% setNames(cg_names)

pdf(here("figs/LFMM/GOmaps_CGs.pdf"), width=15,height=6)
lapply(go_maps_cg, function(x) x)
dev.off()

# show maps
lapply(go_maps_cg, function(x) x)

We look at the correlation across the different genomic offset predictions in the common gardens, i.e. those based on all SNPs and those based on sets of candidates or control SNPs.

Code
par(mfrow=c(1,3))

lapply(cg_names, function(cg_name){

list_go_cg <- lapply(snp_sets, function(x) x$go_cg)
list_go_cg$allsnps <- go_allsnps$go_cg

lapply(list_go_cg, function(set){
  set[[cg_name]]
}) %>% 
    as_tibble() %>% 
    cor() %>% 
    corrplot(method = 'number',type = 'lower', 
             diag = FALSE,mar=c(0,0,2,0),
             title=str_to_title(cg_name),
             number.cex=2,tl.cex=1.5)

})

We save the genomic offset predictions for comparison with the other methods.

Code
snp_sets %>% saveRDS(file=here("outputs/LFMM/go_predictions_snpsets.rds"))

6 Corrected allele frequencies

In a similar way to Gain et al. (2023) and Capblancq et al. (2023), we want to use allele frequencies corrected for population structure in the Gradient Forest analysis. Here we use the outputs from a latent factor mixed model (LFMM) to extract the corrected allele frequencies.

To understand, here the model applied by the lfmm2 function:

\[\mathbf{Y}_{fit} = \mathbf{XB}^T + \mathbf{UV}^T\] where \(\mathbf{B}\), \(\mathbf{U}\) and \(\mathbf{V}\) are the effect size, and factor and loading matrices adjusted by the lfmm2 algorithm from the set of current environmental variables included in the matrix \(\mathbf{X}\). \(\mathbf{B}\) is a matrix of dimension \(p \times b\), with \(p\) the number of genetic markers and \(b\) the number of environmental variables. \(\mathbf{U}\) is a matrix of dimension \(n \times K\), with \(n\) the number of individuals (ie genotypes) and \(K\) the number of latent factors. \(\mathbf{V}\) is a matrix of dimension \(p \times K\). \(\mathbf{X}\) is a matrix of dimension \(n \times b\). \(\mathbf{Y}_{fit}\) is a matrix of dimension \(n \times p\).

We want a matrix of allele frequencies corrected for population structure, so basically:

\[\mathbf{Y}_{corrected} = \mathbf{Y}_{fit} - \mathbf{UV}^T = \mathbf{XB}^T\] Below, we do the matrix multiplication of the matrix \(\mathbf{X}\) (dimension \(n \times b\)) and the transpose of the matrix \(\mathbf{B}\) (dimension \(b \times p\)) to obtain the matrix \(\mathbf{Y}_{corrected}\) (dimension \(n \times p\)).

Code
# we load the imputed genomic data with allele counts (and with MAF)
geno_withmaf <-  read.csv(here("data/DryadRepo/ImputedGenomicData_AlleleCounts_withmaf.csv")) %>% 
  column_to_rownames("snp_ID") %>% 
  t()

# We run LFMM 
mod_withmaf <- lfmm2(input = geno_withmaf,
                     env = as.matrix(clim_ref),
                     K = 6,
                     effect.sizes = T) # to return the matrix B (the matrix of effect sizes)

# Matrix multiplication of matrix X and the transpose of matrix B
geno_corrected <- as.matrix(clim_ref) %*% t(mod_withmaf@B) %>% 
  set_rownames(rownames(geno_withmaf)) %>% 
  as.data.frame() %>% 
  rownames_to_column(var="clon")

# save the matrix for future analyses with and without MAF
geno_corrected %>% write_csv(here("data/GenomicData/CorrectedAlleleFrequencies_withmaf.csv"))

geno_corrected %>% 
  dplyr::select(clon,all_of(colnames(geno))) %>% 
  write_csv(here("data/GenomicData/CorrectedAlleleFrequencies_withoutmaf.csv"))

geno_corrected[1:10,1:10] %>%
  kable_mydf(boldfirstcolumn = 1)
clon snp_1 snp_2 snp_3 snp_4 snp_5 snp_6 snp_7 snp_8 snp_9
ALT10 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT2 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT3 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT4 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT5 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT6 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT8 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ALT9 0.01 0.02 0.13 -0.03 0.08 0.14 0.01 0.02 -0.05
ARM1 0.01 0.01 0.16 -0.05 0.08 0.15 0.00 0.01 -0.07
ARM10 0.01 0.01 0.16 -0.05 0.08 0.15 0.00 0.01 -0.07

7 Session information

Code
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United Kingdom.utf8
 ctype    English_United Kingdom.utf8
 tz       Europe/London
 date     2023-10-03
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package           * version date (UTC) lib source
 assertthat          0.2.1   2019-03-21 [1] CRAN (R 4.2.2)
 backports           1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
 bit                 4.0.5   2022-11-15 [1] CRAN (R 4.2.2)
 bit64               4.0.5   2020-08-30 [1] CRAN (R 4.2.2)
 broom               1.0.2   2022-12-15 [1] CRAN (R 4.2.2)
 cachem              1.0.6   2021-08-19 [1] CRAN (R 4.2.2)
 callr               3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
 cellranger          1.1.0   2016-07-27 [1] CRAN (R 4.2.2)
 class               7.3-20  2022-01-16 [2] CRAN (R 4.2.2)
 classInt            0.4-8   2022-09-29 [1] CRAN (R 4.2.2)
 cli                 3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
 codetools           0.2-18  2020-11-04 [2] CRAN (R 4.2.2)
 colorspace          2.0-3   2022-02-21 [1] CRAN (R 4.2.2)
 corrplot          * 0.92    2021-11-18 [1] CRAN (R 4.2.2)
 cowplot           * 1.1.1   2020-12-30 [1] CRAN (R 4.2.2)
 crayon              1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
 DBI                 1.1.3   2022-06-18 [1] CRAN (R 4.2.2)
 dbplyr              2.2.1   2022-06-27 [1] CRAN (R 4.2.2)
 devtools            2.4.5   2022-10-11 [1] CRAN (R 4.2.2)
 digest              0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
 dplyr             * 1.0.10  2022-09-01 [1] CRAN (R 4.2.2)
 e1071               1.7-13  2023-02-01 [1] CRAN (R 4.2.2)
 ellipsis            0.3.2   2021-04-29 [1] CRAN (R 4.2.2)
 evaluate            0.19    2022-12-13 [1] CRAN (R 4.2.2)
 fansi               1.0.3   2022-03-24 [1] CRAN (R 4.2.2)
 farver              2.1.1   2022-07-06 [1] CRAN (R 4.2.2)
 fastmap             1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
 forcats           * 0.5.2   2022-08-19 [1] CRAN (R 4.2.2)
 fs                  1.6.0   2023-01-23 [1] CRAN (R 4.2.2)
 gargle              1.2.1   2022-09-08 [1] CRAN (R 4.2.2)
 generics            0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
 ggplot2           * 3.4.0   2022-11-04 [1] CRAN (R 4.2.2)
 glue                1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
 googledrive         2.0.0   2021-07-08 [1] CRAN (R 4.2.2)
 googlesheets4       1.0.1   2022-08-13 [1] CRAN (R 4.2.2)
 gtable              0.3.3   2023-03-21 [1] CRAN (R 4.2.3)
 haven               2.5.1   2022-08-22 [1] CRAN (R 4.2.2)
 here              * 1.0.1   2020-12-13 [1] CRAN (R 4.2.2)
 highr               0.10    2022-12-22 [1] CRAN (R 4.2.2)
 hms                 1.1.2   2022-08-19 [1] CRAN (R 4.2.2)
 htmltools           0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
 htmlwidgets         1.6.0   2022-12-15 [1] CRAN (R 4.2.2)
 httpuv              1.6.8   2023-01-12 [1] CRAN (R 4.2.2)
 httr                1.4.4   2022-08-17 [1] CRAN (R 4.2.2)
 jsonlite            1.8.4   2022-12-06 [1] CRAN (R 4.2.2)
 kableExtra        * 1.3.4   2021-02-20 [1] CRAN (R 4.2.2)
 KernSmooth          2.23-20 2021-05-03 [2] CRAN (R 4.2.2)
 knitr             * 1.41    2022-11-18 [1] CRAN (R 4.2.2)
 labeling            0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
 later               1.3.0   2021-08-18 [1] CRAN (R 4.2.2)
 latex2exp         * 0.9.6   2022-11-28 [1] CRAN (R 4.2.2)
 lattice             0.20-45 2021-09-22 [2] CRAN (R 4.2.2)
 LEA               * 3.11.4  2023-03-22 [1] Github (bcm-uga/LEA@d7505ab)
 lifecycle           1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
 lubridate           1.9.0   2022-11-06 [1] CRAN (R 4.2.2)
 magrittr          * 2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
 Matrix              1.5-1   2022-09-13 [2] CRAN (R 4.2.2)
 memoise             2.0.1   2021-11-26 [1] CRAN (R 4.2.2)
 mgcv                1.8-41  2022-10-21 [2] CRAN (R 4.2.2)
 mime                0.12    2021-09-28 [1] CRAN (R 4.2.0)
 miniUI              0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2)
 modelr              0.1.10  2022-11-11 [1] CRAN (R 4.2.2)
 munsell             0.5.0   2018-06-12 [1] CRAN (R 4.2.2)
 nlme                3.1-160 2022-10-10 [2] CRAN (R 4.2.2)
 pillar              1.9.0   2023-03-22 [1] CRAN (R 4.2.3)
 pkgbuild            1.4.0   2022-11-27 [1] CRAN (R 4.2.2)
 pkgconfig           2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
 pkgload             1.3.2   2022-11-16 [1] CRAN (R 4.2.2)
 plyr                1.8.8   2022-11-11 [1] CRAN (R 4.2.2)
 prettyunits         1.1.1   2020-01-24 [1] CRAN (R 4.2.2)
 processx            3.8.0   2022-10-26 [1] CRAN (R 4.2.2)
 profvis             0.3.7   2020-11-02 [1] CRAN (R 4.2.2)
 promises            1.2.0.1 2021-02-11 [1] CRAN (R 4.2.2)
 proxy               0.4-27  2022-06-09 [1] CRAN (R 4.2.2)
 ps                  1.7.2   2022-10-26 [1] CRAN (R 4.2.2)
 purrr             * 1.0.0   2022-12-20 [1] CRAN (R 4.2.2)
 qvalue              2.30.0  2022-11-01 [1] Bioconductor
 R6                  2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
 ragg                1.2.4   2022-10-24 [1] CRAN (R 4.2.2)
 raster            * 3.6-11  2022-11-28 [1] CRAN (R 4.2.2)
 RColorBrewer      * 1.1-3   2022-04-03 [1] CRAN (R 4.2.0)
 Rcpp                1.0.10  2023-01-22 [1] CRAN (R 4.2.2)
 readr             * 2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
 readxl            * 1.4.1   2022-08-17 [1] CRAN (R 4.2.2)
 remotes             2.4.2   2021-11-30 [1] CRAN (R 4.2.2)
 reprex              2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
 reshape2          * 1.4.4   2020-04-09 [1] CRAN (R 4.2.2)
 rlang               1.1.0   2023-03-14 [1] CRAN (R 4.2.3)
 rmarkdown           2.19    2022-12-15 [1] CRAN (R 4.2.2)
 rnaturalearth     * 0.3.2   2023-01-23 [1] CRAN (R 4.2.3)
 rnaturalearthdata   0.1.0   2017-02-21 [1] CRAN (R 4.2.3)
 rprojroot           2.0.3   2022-04-02 [1] CRAN (R 4.2.2)
 rstudioapi          0.14    2022-08-22 [1] CRAN (R 4.2.2)
 rvest               1.0.3   2022-08-19 [1] CRAN (R 4.2.2)
 scales              1.2.1   2022-08-20 [1] CRAN (R 4.2.2)
 sessioninfo         1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
 sf                  1.0-9   2022-11-08 [1] CRAN (R 4.2.2)
 shiny               1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
 sp                * 1.5-1   2022-11-07 [1] CRAN (R 4.2.2)
 stringi             1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
 stringr           * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 svglite             2.1.0   2022-02-03 [1] CRAN (R 4.2.2)
 systemfonts         1.0.4   2022-02-11 [1] CRAN (R 4.2.2)
 terra               1.6-47  2022-12-02 [1] CRAN (R 4.2.2)
 textshaping         0.3.6   2021-10-13 [1] CRAN (R 4.2.2)
 tibble            * 3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
 tidyr             * 1.2.1   2022-09-08 [1] CRAN (R 4.2.2)
 tidyselect          1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
 tidyverse         * 1.3.2   2022-07-18 [1] CRAN (R 4.2.2)
 timechange          0.1.1   2022-11-04 [1] CRAN (R 4.2.2)
 tzdb                0.3.0   2022-03-28 [1] CRAN (R 4.2.2)
 units               0.8-1   2022-12-10 [1] CRAN (R 4.2.2)
 urlchecker          1.0.1   2021-11-30 [1] CRAN (R 4.2.2)
 usethis             2.1.6   2022-05-25 [1] CRAN (R 4.2.2)
 utf8                1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
 vctrs               0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
 viridisLite         0.4.2   2023-05-02 [1] CRAN (R 4.2.3)
 vroom               1.6.0   2022-09-30 [1] CRAN (R 4.2.2)
 webshot             0.5.4   2022-09-26 [1] CRAN (R 4.2.2)
 withr               2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
 xfun                0.36    2022-12-21 [1] CRAN (R 4.2.2)
 xml2                1.3.3   2021-11-30 [1] CRAN (R 4.2.2)
 xtable            * 1.8-4   2019-04-21 [1] CRAN (R 4.2.2)
 yaml                2.3.6   2022-10-18 [1] CRAN (R 4.2.2)

 [1] C:/Users/jularc/AppData/Local/R/win-library/4.2
 [2] C:/Program Files/R/R-Install/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

References

Capblancq, Thibaut, Susanne Lachmuth, Matthew C Fitzpatrick, and Stephen R Keller. 2023. “From Common Gardens to Candidate Genes: Exploring Local Adaptation to Climate in Red Spruce.” New Phytologist 237 (5): 1590–1605.
Caye, Kevin, Basile Jumentier, Johanna Lepeule, and Olivier François. 2019. “LFMM 2: Fast and Accurate Inference of Gene-Environment Associations in Genome-Wide Studies.” Molecular Biology and Evolution 36 (4): 852–60.
Frichot, Eric, Sean D Schoville, Guillaume Bouchard, and Olivier François. 2013. “Testing for Associations Between Loci and Environmental Gradients Using Latent Factor Mixed Models.” Molecular Biology and Evolution 30 (7): 1687–99.
Gain, Clément, and Olivier François. 2021. “LEA 3: Factor Models in Population Genetics and Ecological Genomics with R.” Molecular Ecology Resources 21 (8): 2738–48.
Gain, Clément, Benedicte Rhone, Philippe Cubry, Israfel Salazar, Florence Forbes, Yves Vigouroux, Flora Jay, and Olivier François. 2023. “A Quantitative Theory for Genomic Offset Statistics.” bioRxiv, 2023–01.
Jaramillo-Correa, Juan-Pablo, Isabel Rodrı́guez-Quilón, Delphine Grivet, Camille Lepoittevin, Federico Sebastiani, Myriam Heuertz, Pauline H Garnier-Géré, et al. 2015. “Molecular Proxies for Climate Maladaptation in a Long-Lived Tree (Pinus Pinaster Aiton, Pinaceae).” Genetics 199 (3): 793–807.
Pritchard, Jonathan K, Matthew Stephens, and Peter Donnelly. 2000. “Inference of Population Structure Using Multilocus Genotype Data.” Genetics 155 (2): 945–59.